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GEOMETRIC  PROPERTIES  OF  THE  MONOTONIC  LOGICAL  GRID 
ALGORITHM  FOR  NEAR  NEIGHBOR  CALCULATIONS 


1.  Background 

This  paper  analyzes  an  efficient  algorithm  for  keeping  track  of  “near  neighbor”  rela¬ 
tionships  among  a  large  number  of  nodes,  i.e.  locations,  objects  or  particles,  in  a  region  of 
3D  space.  The  need  to  treat  “near  neighbor”  interactions  applies  to  any  system  where: 

1)  The  node  positions  change  due  to  particle  velocity,  local  fluid  velocity  or  changing 
view  point.  The  neighborhood  of  each  node  is  subject  to  continual  change  as  some 
nodes  move  closer  and  others  away. 

2)  Nearby  node  pairs  interact.  The  interaction  could  be  an  interparticle  force  or  the  rate 
of  exchange  of  some  quantity.  Other  relationships  include  geometric  obscuration  or 
graphical  hidden  line  removal. 

3)  One  can  define  a  “cutoff1  separation  or  radius  Rc  according  to  the  type  of  interac¬ 
tion  considered.  For  intemode  separations  greater  than  Rc  the  interactions  may  be 
neglected,  computed  through  some  other  approximation,  or  included  through  interac¬ 
tions  with  nearer  nodes. 

For  a  large  system  of  N  nodes,  it  is  advantageous  to  compute  the  interactions  of  each 
node  with  only  a  relatively  few  near  neighbors.  Pairwise  interactions  are  only  computed 
when  the  relative  separation  of  the  two  nodes  is  less  than  the  “cutoff  radius”. 

It  follows,  for  each  node  in  a  system  of  N  nodes,  that  one  must  make  .V  -  1  “cutoff1 
tests  when  no  algorithm  is  available  to  identify  the  near  neighbors.  Consequently,  the  op¬ 
eration  count  for  this  distance  checking  procedure  scales  as  N-.  Even  when  an  interaction 
is  neglected  because  \Ra  —  R<, j  >  Rc ,  checking  the  separation  distance  requires  about  10 
floating  point  operations  per  pair,  a  substantial  fraction  cf  the  work  needed  to  calculate 
the  entire  interaction. 

The  operation  count  to  identify  near  neighbors  can  be  reduced  significantly  when 
node  coordinates  are  ordered  such  that  cutoff  separation  tests  need  only  be  performed 
over  a  small  subset  (.V,)  of  the  total  number  of  nodes  in  the  system.  Scalar  sorting 
procedures  have  been  developed  for  this  purpose  with  operation  counts  scaling  iineariy 
with  .V  ,1,5;.  Because  of  the  relatively  slow  scalar  operations  required  in  these  algorithms 
to  keep  track  of  near  neighbors,  however,  the  computational  cost  is  still  prohibitive  for  large 
3D  systems  using  vector  oriented  or  synchronous,  parallel  processing  super  computers.  The 
communications  and  data  structures  for  these  scalar  algorithms  are  also  not  optimum  for 
the  fastest  computers  available. 

Neighbor  list  techniques  6]  which  are  vectcrizable  have  been  developed,  however,  the 
structure  of  these  algorithms  have  large  storage  requirements  and  are  thus  not  applicable 
to  large  systems. 

Boris  ’2)  and  Boris  and  Lambrakos  '3^  have  developed  an  algorithm  for  keeping  track  of 
near  neighbor  interactions  and  geometric  relationships  which  scales  as  .V  and  is  structured 
to  permit  optimized  vector  and  parallel  processor  implementations.  This  development 
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followed  from  efforts  on  the  “nearest  neighbors”  problem  begun  with  Dr.  K.V.  Roberts  [7] 
at  Culham  Laboratory  in  the  context  of  gravitationally  attracting  stars.  There  we  choose 
a  field-solver  approach  to  finding  the  forces  not  only  because  of  the  long  range  nature  of 
the  gravitational  force  but  also  because  of  the  long  range  nature  of  the  gravitational  force 
but  also  because  a  good  nearest  neighbors  algorithm  was  lacking. 

The  new  algorithm  uses  a  Monotonic  Logical  Grid  (MLG)  for  indexing  the  geometric 
positions  and  other  dynamical  attributes  of  the  moving  nodes  in  computer  memory.  The 
indexing  ensures  that  nodes  which  are  adjacent  in  real  space  we  given  MLG  indices  which 
are  also  very  close.  When  two  adjacent  nodes  pass  each  other  in  real  space,  relative  to 
the  chosen  coordinate  direction,  their  indices  are  exchanged  or  “swapped”  in  the  MLG  by 
moving  the  data  for  each  node  from  its  original  indexed  location  to  the  indexed  location 
of  the  other  node  which  it  passed  in  space.  The  data  for  the  nodes  are  “swapped”  in  the 
computer  memory  cells.  This  local  “swapping"  maintains  a  monotone  mapping  between 
the  instantaneous  positions  of  the  nodes  in  real  space  and  their  MLG  indices.  The  ordinal 
node  locations  within  the  compact,  regular  MLG  arrays  are  the  same  as  the  ordinal  node 
locations  in  space.  Node  positions  or  any  node  attributes  indexed  in  computer  memory 
according  to  this  scheme  are  said  to  be  in  “MLG  order” . 

The  speed  of  the  MLG  algorithm  depends  on  1)  its  easy  vectorization;  2)  the  rapid 
convergence  of  the  “swapping”  procedure,  to  restore  the  MLG;  3)  the  average  distance  the 
nodes  travel  between  MLG  reorderings  being  large;  4)  the  number  of  interactions  for  each 
node;  and  5)  the  computational  cost  of  computing  them.  In  our  tests  and  applications  to 
date  the  cost  in  computer  time  of  swapping  iterations  scales  as  C\  x  .V  x  log2(N)  while 
the  cost  of  calculating  pair  interactions  scales  as  C2  x  N.  So  much  work  is  done  per  node 
to  calculate  the  pair  interactions,  however,  that  C i  x  log2[N )  is  of  order  0.04  x  C2  for 
.V  =  512.  When  N  =  262,144,  log2(N)  i3  a  factor  of  two  larger  and  the  cost  of  resetting 
the  MLG  increases  to  3%  of  the  interaction  calculations. 

This  paper  presents  an  analysis  and  statistical  results  of  the  MLG  algorithm  applied 
to  the  random  motion  of  point  nodes  in  a  cubical  domain  which  is  periodic  in  X  and  Y  and 
bounded  in  Z  by  two  reflecting  walls.  The  nodes  are  non-interacting  and  have  a  random 
distribution  of  initial  velocities.  The  two  major  aspects  of  the  MLG  algorithm  considered 
here  are  the  convergence  of  the  “swapping"  algorithm  to  maintain  MLG  ordering  and  the 
spatial  properties  of  the  particular  near  neighbor  indexing  grid  %ve  have  used,  the  “skew- 
periodic"  MLG  designed  to  facilitate  long  vector  operations.  This  paper  examines  an 
MLG  comprised  of  NZ  identical  logical  pianes.  Node  locations  within  each  k-plane  are 
indexed  via  a  “skew-periodic”  two-dimensional  grid.  The  skew  periodic  indexing  scheme 
is  described  and  analyzed  in  §4. 

“Swapping"  to  maintain  the  monotone  mapping  is  an  iterative  process.  Its  conver¬ 
gence  rate  depends  on  the  extent  to  which  relative  node  positions  are  perturbed  between 
restoration  of  the  MLG,  i.e.  how  far  the  nodes  move,  and  on  the  size  of  the  grid.  Anal¬ 
ysis  of  “swapping”  iteration  level  frequencies  show  a  better  integrated  convergence  rate 
for  “large"  changes  in  the  positions  of  the  nodes,  i.e.  long  timesteps  resulting  in  signif¬ 
icant  MLG  distortion.  This  result  is  extremely  encouraging  as  it  implies  that  the  MLG 
swapping  procedure  is  also  a  good  way  to  order  completely  disordered  nodes.  This  pa¬ 
per  presents  frequency  distributions  of  “swapping”  iterations  and  “swapping”  convergence 
characteristics  for  different  timestep  sizes  and  system  sizes. 


The  MLG  algorithm  is  adaptable  to  a  wide  range  of  applications  including  important 
problems  in  astrophysics,  molecular  dynamics  and  fluid  dynamics  which  require  calculation 
of  near  neighbor  interactions  for  a  large  number  of  nodes  whose  relative  positions  can 
change  in  a  variety  of  ways. 

2.  Near  Neighbors  Template  for  Monotonic  Logical  Grid 
Consisting  of  Indexed  Planes 

The  “order  JV”  scaling  of  the  MLG  algorithm  is  effected  by  restricting  the  compu¬ 
tations  of  node-node  interactions  to  a  finite  set  of  small  index  offsets  in  the  MLG  which 
correspond  to  the  near  neighbors  in  space.  The  size  and  configuration  of  this  set  of  index 
offsets,  termed  the  Near  Neighbors  Template  (NNT),  influences  the  coefficient  of  the  MLG 
cost  which  scales  with  .V. 

The  NNT  can  be  visualized  as  a  cluster  of  nodes,  the  near  neighbors,  surrounding  a 
“target  node”  (TN).  If  a  particular  node  is  taken  as  the  target,  the  remaining  nodes  of 
the  “template”  define  a  local  “pattern”  in  the  MLG  corresponding  to  the  relative  index 
offsets  of  what  can  be  assumed  to  be  near  neighbors  of  the  target  node.  Typical  NNTs  with 
different  upper  bounds  for  the  logical  displacement  of  near  neighbors  (i.e.  shells)  are  shown 
in  Fig.  1.  Only  index  offsets  larger  than  zero  need  be  considered  since  ail  interactions  with 
nodes  having  a  negative  address  offset  will  have  been  calculated  previously  when  those 
nodes  were  target  nodes.  Three  shells  of  interaction  are  defined  in  Fig.  1  corresponding 
(approximately)  to  neighbors  at  different  probable  separations.  The  16  neighboring  nodes 
indicated  with  squares  form  the  closest  shell.  The  30  triangle  nodes  are  on  average  further 
away  and  the  16  circle  nodes  are  yet  further  away.  The  full  5x5x5  cubical  template  shown 
in  Fig.  1.  nominally  includes  125  nodes.  Since  the  target  node  does  not  interact  with  itself 
and  each  interaction  does  not  need  to  be  counted  twice,  there  are  =  62  interactions 

considered  for  each  node.  To  complete  the  shell  of  circle  points  requires  considering  nodes 
still  further  from  the  target  cell  than  two  layers  in  each  direction. 

In  general,  there  should  be  a  correlation  between  the  size  of  an  NNT,  in  terms  of 
the  number  of  nodes  included  in  the  logical  near  neighborhood,  and  the  average  distance 
between  nodes  in  the  system.  However,  the  NNT  size  and  configuration  is  controlled  by 
the  probability,  as  a  function  of  MLG  index,  of  a  neighboring  node  having  a  separation 
less  than  Rc.  The  NNT  should  be  taken  large  enough  that  the  likelihood  cf  a  nearmiss, 
that  is  a  node  which  is  outside  the  template  being  found  within  Rc  of  the  target  node,  is 
acceptably  small. 

The  characteristics  of  a  Near  Neighbors  Template  will  depend  on  the  particular  mono¬ 
tone  mapping  impiemented  between  the  spatial  locations  of  nodes  and  the  corresponding 
locations  in  the  computer  memory.  When  the  average  node  separation  in  Z ,  for  example, 
is  half  of  the  average  separation  in  .Y  or  7,  the  NNT  probably  should  reach  more  MLG 
index  layers  in  the  k  direction  (aicng  Z).  In  fact,  there  usually  exists  more  than  one  MLG 
defining  a  monotone  spacs-co-index  mapping.  This  property  of  MLGs  provides  latitude  for 
further  optimization  with  respect  to  particular  problems.  For  example,  an  optimum  MLG 
for  one  problem  may  minimize  distances  to  near  neigbors.  In  another  problem,  the  MLG 
may  be  optimized  when  that  the  shortest  distance  to  r.cn-near  neighbors  is  maximized. 
This  paper  examines  the  properties  of  a  simple  NNT  of  the  type  shown  in  Fig.  1..  applied 
to  a  skew-oeriedic  MLG. 
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The  statistical  analysis  of  near  neighbor  locations  which  follows  is  discussed  in  terms 
of  the  (NNT)  described  below.  This  analysis  considers  the  following  questions: 

(1)  What  is  the  correspondence  between  relative  index  offsets  of  nodes  in  the  MLG  and 
the  corresponding  relative  spatial  positions?  What  is  the  average  separation  in  space 
of  two  nodes  which  are  logically  adjacent?  How  does  this  average  depend  on  the  offsets 
of  the  MLG  indices  between  the  two  nodes? 

(2)  How  does  the  necessary  configuration  and  size  of  a  NNT  depend  on  the  specific  type 
of  MLG  used  for  indexing  node  positions,  i.e.  the  specific  indexing  scheme?  In  par¬ 
ticular,  what  characteristics  of  a  skew-periodic  MLG  might  require  or  benefit  from  a 
modification  of  the  NNT  configuration? 

(3)  For  a  given  MLG,  what  methods  are  available  for  optimizing  the  configuration  of  an 
NNT  and  how  does  node  motion  affect  this  optimization?  For  example,  for  nodes 
moving  randomly  in  3D  space  the  probability  distribution  for  the  relative  separation 
of  near  neighbors  is  spherically  symmetric. 

This  symmetry  can  be  used  to  reduced  NNT  size  without  inhibiting  the  accuracy  of 
an  algorithm  to  compute  interactions  based  on  a  particular  MLG. 

The  appropriate  NNT  of  course  depends  on  the  specific  MLG  scheme  selected.  The 
MLG  considered  in  this  analysis  is  a  skew-periodic  grid  as  described  in  $4-  It  consists  of  8 
logical  planes  each  consisting  of  64  logical  ceils  arranged  in  an  8  x  8  array.  For  this  MLG 
the  number  of  node-node  interactions  computed  each  timestep  depends  on  the  number  of 
inter-  and  intraplane  interactions  indexed  by  the  NNT  for  each  “target  node’’ .  An  example 
of  the  computational  cost  coefficient  in  front  of  the  order  N  scaling  of  the  MLG  algorithm 
is  given  in  the  appendix. 

3.  Statistical  Analysis  of  Near  Neighbor  Positions 
for  Points  in  3D  Random  Motion 


Interpreting  statistical  information  concerning  spatial  relationships  and  correlations 
between  positions  in  the  MLG  requires  specifying  the  parameters  which  affect  these  posi¬ 
tions  each  timestep.  For  any  system  of  nodes  seme  of  the  major  parameters  are: 

(1)  The  size  of  the  spatial  domain  relative  to  the  number  of  nodes  comprising  the  system, 
i.e.  the  node  density. 

(2)  The  nature  of  the  motion  of  the  node  system,  e.g.  random  or  nonrandom,  rotational, 
compressional,  anisotropic,  etc. 

(3)  The  logical  structure  of  the  MLG  used  to  index  the  node  positions. 

The  statistical  analysis  described  in  this  section  considers  a  system  of  512  non- 
interacting  points.  The  velocities  of  these  points  was  random  and  uniformly  distributed 
in  each  coordinate  from  -1  x  107  cm,  sec  to  1  x  10r  cm  sec  .  The  spatiai  domain. 
30  Ax  80  Ax  80  A,  corresponds  to  an  average  separation  of  adjacent  nodes  of  approximately 
10  A  in  each  coordinate,  roughly  the  density  of  gas  near  5TP. 

Useful  statistical  information  about  neighbor  positions  is  obtained  by  analyzing  the 
occupation  frequency  and  distribution  cf  neighbor-target  node  separations  for  the  different 
NNT  offsets.  These  distributions  are  accumulated  over  a  sufficiently  large  number  cf 
timesteps  that  Suctuations  have  become  small.  For  the  system  considered  here,  we  define 
a  frequency  distribution  function  f(R,::vj.k),  where 
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f(R;i,j,k)  =  Frequency  for  a  near  neighbor  node  (with 
NNT  offsets  i,  j,  k)  having  a  separation 
from  the  target  node  contained  in  the  shell 
extending  from  radius  Rto  Rt  DR. 

The  distance  classification  interval  DR  is  adjusted  according  to  the  number  of  tixnesteps 
used. 

Shown  in  Fig.  2.  are  probability  distributions  for  neighbor  separations  corresponding 
to  two  NNT  offsets.  In  computing  these  distributions  all  512  nodes  were  sampled.  The 
distance  classification  interval  DR  for  each  of  these  distributions  was  .3  A.  The  time 
sampling  interval  used  in  computing  f(R:i  j^k)  for  each  of  the  NNT  offsets  consisted  of  five 
hundred  timesteps  of  length  2.5  x  10“ 16  second.  This  time  interval  is  sufficiently  long  that 
the  high  velocity  nodes  easily  traverse  the  spatial  domain,  i.e.  80  A,  several  times.  Fig.  2. 
shows  a  correlation  between  the  mean  value  of  f(R;ij,k)  and  the  NNT  offsets.  The  peak 
of  the  distribution  moves  approximately  10  A  toward  larger  separations  (from  21  A  to  31 
A)  when  one  more  node  is  added  between  target  node  and  the  interaction  node. 

Shown  in  Fig.  3.  are  frequencies  for  neighbor  separations  at  short  range  corresponding 
to  an  NNT  cell  having  a  relatively  large  index  offset  from  the  target  node,  i.e.  the  prob¬ 
ability  distribution  in  Fig.  2.  corresponding  to  NNT  offset  (-3,1,0).  As  can  be  seen,  even 
for  relatively  large  index  offsets  from  the  target  node  there  is  a  small  but  finite  probability 
for  a  not  so  near  neighbor  coming  quite  “close” .  It  is  these  rare  “near  miss”  events  which 
determine  the  required  NNT  size. 

Additional  information  concerning  neighbor  positions  is  provided  by  cumulative  in¬ 
tegrals  over  f(R;ij,k)  from  0  to  a  given  separation  from  the  target  node.  These  integrals 
give  the  probability  the  field  nodes  with  a  given  offset  from  their  target  node  come  within 
a  particular  distance  of  the  target  node  in  space,  a  statistical  “near  miss”  probability. 
Such  information  provides  a  criterion  for  optimizing  (or  minimizing)  the  NNT  based  on 
the  “cutoff  radius”  Rc  for  the  particular  system.  NNT  optimization  is  effected  via  analysis 
of  the  near  miss  frequency  for  each  NNT  offset,  i.e.  the  cumulative  probability  of  finding 
a  node  indexed  outside  the  NNT  but  within  a  distance  Rc  of  the  target  node.  Shown 
in  Fig.  4.  are  the  near  miss  probabilities  as  a  function  of  NNT  offset  for  the  particu¬ 
lar  case  Rc  =  3A.  These  probabilities  were  computed  with  the  same  sampling  used  for 
the  frequency  functions  shewn  in  Figs.  2  and  3.  NNT  optimization  based  on  near  miss 
probability  is  discussed  in  30. 

For  the  system  of  nodes  represented  in  Fig.  4,  i.e.  a  system  where  the  cutoff  radius 
Rc  =  3A,  ail  NNT  ceils  more  than  three  index  offsets  from  the  “target  ceil”  in  any  direction, 
have  zero  probability  of  recording  a  “near  miss”,  i.e.  coming  within  3  A  of  the  target  cell. 
For  this  system  a  suitable  upper  bound  on  the  iogicai  separation  included  in  the  Near 
Neighbors  Template  is  therefore  .V„.  =  3.  For  this  case  (referring  to  the  appendix)  the 
total  number  of  node-node  interactions  computed  each  timestep  for  a  MLG  cf  size  .V  ;s 


Computed  Interactions  =  171  x  N  -  294  x  (.V*).  (3.2, 

The  NNT  corresponding  to  (3.2)  was  selected  on  the  basis  of  an  upper  bound,  .Ve,  on 
the  MLG  location  of  “nearest  neighbors”.  However,  as  seen  in  Fig.  4,  further  deletion  cf 
NNT  cells  is  possible  using  an  NNT  which  is  more  nearly  spherical.  This  aspect  of  NNT 
optimization  is  examined  presently  for  the  “vector  compatible'1  skew-periodic  MLG. 


4.  A  “Skew  Periodic”  MIG  for  Indexing  the  Geometric  Positions 
of  Nodes  in  Computer  Memory 


More  than  one  type  of  MLG  for  indexing  nodes  within  a  given  spatial  domain  exists. 
In  particular,  for  nodes  in  a  3D  spatial  domain  with  periodic  boundary  conditions,  a  Mono¬ 
tonic  Logical  Grid  can  be  constructed  which  permits  more  efficient  vector  manipulation  of 
node  attributes  than  the  regular  periodic  grid.  This  so  called  “skew  periodic”  MLG  also 
results  in  an  average  partitioning  of  the  computational  domain  in  cells  of  equal  statistical 
volume. 

“Position  skewing”  is  a  statistical  property  of  a  system  of  nodes  such  that  the  average 
relative  location  of  neighbor  nodes  is  a  function  of  the  direction  of  the  MLG  coordinate 
offset  relative  to  the  target  node.  These  properties  follow  from  the  asymmetric  constraints 
of  indexing  node  positions  in  a  skew  periodic  lattice.  In  a  skew  periodic  MLG  the  mapping 
between  the  spatial  coordinates  of  the  neighbors  and  their  MLG  indices  is  not  exactly 
aligned,  as  shown  clearly  in  Figs.  7  and  8.  It  is  shown  presently  that  position  skewing 
depends  on  the  MLG  configuration  and  diminishes  with  increased  size  of  the  node  system. 

Given  N  nodes  randomly  located  in  a  region  of  3D  space,  one  can  associate  with 
each  node  not  only  its  spatial  coordinates  (X,  Y,Z)  but  also  a  set  of  MLG  indices  (ij,k). 
A  useful  mapping,  which  we  have  named  a  Monotonic  Logical  Grid,  obtains  when  the 
node  locations  in  space  and  the  node  indices  in  the  computer  memory  satisfy*  a  set  of 
monotonicity  conditions, 


X{i,j,k)  <  X(i  -  1  ,j,k) 
Y(iJ,k)<Y{i,j-l,k) 
and  Z{i,j,k)<Z[i,j,k  —  l) 


1 < i < NX  - 1 
1  <  ;'  <  NY  -  1  (4.1) 

1  <  k  <  NZ  -  1. 


Here  N,  the  total  number  of  nodes,  equals  NX  x  NY  x  NZ.  Note  that  the  average 
separation  of  neighbors  is  not  independent  of  the  direction  of  their  MLG  index  displacement 
since  the  metrics  of  the  X,  Y  and  Z  coordinates  in  the  spatial  domain  need  not  be  equal 
(ncr  even  orthogonal).  The  MLG  conditions  defined  in  (4.1),  although  mathematically 
satisfactery  for  organizing  random  locations  in  space,  is  not  optimal  for  mapping  node 
positions  into  a  vector  computer  memory.  A  skew  periodic  MLG  is  described  in  14  which 
is  more  efficient. 

The  MLG  is  comprised  of  a  set  of  logical  planes,  NZ ,  which  are  each  skew-periodic. 
Thus,  for  all  nodes  in  the  system  there  will  be  one  space  coordinate,  say  Z ,  and  one  MLG 
piane  index,  say  k.  which  must  satisfy  the  mono  tonicity  condition  Z{k)  <  Z(k— 1)  for  1 
<  k  <  NZ  -  1.  Given  that  .V  =  NX  <  NY  x  NZ.  each  logical  piane  of  the  MLG  will 
index  NX  x  NY  nodes  randomly  located  in  3D  space.  Complete  compact  vectorization 
of  eacr.  piane  is  then  effected  by  indexing  in  monotcnic  order  the  locations  of  the  "first” 
NX  nodes  and  selected  periodic  images  of  the  other  NX)  <  NY  -  1  nodes  which  are 
ail  at  assigned  distances  from  the  actual  node  positions.  This  indexing  scr.eme  provides 
a  mapping  of  3D  space  onto  a  single  continuous  MLC  coordinate  axis.  Such  a  mapping 
satisfies  siightiy  different  mcnoconicitv  conditions  from  '4.1). 


X\ij,k)  <  X {'.]  —  1.  k) 
Y[ij,k )  <  Y\ij  -  NX.  k) 
and  Z'ij,k)  <  Z{ij.k  -  1) 


1  <  ij  <  NX  <  NY 
1  <  j;  <  .V.Y  <  .W  (4.3) 

1  <  k  <  NZ. 


6 


The  average  distance  of  neighboring  nodes  from  the  target  node  is  recorded  in  Fig. 
9a  for  the  system  of  512  points  with  random  positions.  Note  that  each  NNT  node  has  an 
average  RMS  distance  to  the  target  node  which  is  only  about  a  factor  of  %/2  longer  (or 
less)  than  the  corresponding  distance  in  a  perfectly  regular  Eulerian  grid.  Shown  in  Figs. 
9b  and  9c  are  the  average  separations  of  neighbors  in  X  and  Y,  respectively.  The  average 
X  separation  is  exactly  what  we  would  expect,  10  A.  It  is  seen  in  these  figures  that  only 
the  average  Y  separations  are  affected  by  skewing  as  we  expect. 

Indexing  the  NNT  locations  using  coordinates  (:',  j)  and  taking  the  target  node  (TN) 
as  the  origin,  “position  skewing”  is  examined  by  taking  divided  differences  between  the  Y 
components  of  neighbors.  From  Fig.  9c,  for  a  8  x  8  x  8  system  of  nodes  having  random 
velocities  (top  number  in  each  cell), 

[y(3,3)  -  r(-3,3)l  _  (33.75  -  26.25) 

60  60  #y  (4’3) 


=  0.125  =  r 


NX  x  NY] 


where  the  average  node  separation  in  each  direction  is  10  A.  Because  “position  skew¬ 
ing”  scales  as  yW,  its  affect  diminishes  as  the  system  size  increases.  This  is  a  relatively 
small  price  to  pay  for  more  efficient  data  storage  and  vectorization.  From  Fig.  9c,  for  a 
16  x  16  x  16  system  of  nodes  (bottom  number  in  each  cell  shown)  having  random  velocities, 

[Y (3, 3)  —  y(— 3. 3)1  =  (31.88  -  23.13) 

60  60  (4.4) 

=  .0625  =  — , 


as  exoected. 


5.  Discussion 


The  non-orthogonai,  skew-periodic  indexing  scheme  works  well  in  the  MLG  case  be¬ 
cause  the  local  grid  in  the  various  index  directions  could  not  be  orthogonal  In  any  case 
because  the  nodes  are  Lagrangian.  Thus,  there  is  no  advantage  to  maintaining  orthogo¬ 
nality  in  node  indexing  when  actual  spatial  orthogonality  is  not  possible. 

In  large  systems  there  is  the  possibility  that  the  X  coordinates  may  lose  some  accu¬ 
racy  because  many  system  lengths  have  to  be  added  to  the  X  coordinate  when  using  the 
technique  with  one  big  3D  system.  If  skew  periodicity  were  being  used  in  a  100  x  ICO  <  100 
system,  the  image  or  point  (0.0.0)  would  be  I0.CCC  system  lengths  away,  or  10c  typical  grid 
spacings.  Cleariy  32-bit  precision  wcuid  not  be  adequate.  In  suer,  a  large  system  however, 
one  would  net  want  to  use  this  technique  in  ail  directions.  When  the  vectors  become  long 
enough,  vectorizing  in  two  dimensions  at  once  is  no  longer  so  attractive.  On  a  Cray,  for 
example,  vectors  of  length  64  are  long  enough. 

On  arrays  of  parallel  processors,  actual  wraparound  is  sometimes  implemented  in 
hardware  ar.d  skew  periodic  connectivity  is  often  extended  to  a  numoer  ot  dimensions  in 
the  hypercube  architecture.  Ln  these  cases  the  difficulties  and  costs  overcome  oy  the  skew 
periodic  MLG  in  many  conventional  vector  and  pipelined  processors  wcuid  be  absent  so 
using  the  regular  periodic  grid  might  be  simpler. 
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Although  the  physical  system  being  described  in  the  two  representations,  i.e.  Figs.  5 
and  6,  are  identical,  the  data  being  stored  to  describe  each  system  is  quite  different  because 
different  images  of  the  nodes  are  active  in  each  case.  The  X  and  Y  coordinates  of  nodes 
1,  2  and  3  are  the  same  in  both  representations.  In  the  regular  periodic  grid  the  nodes 
4,  3  and  6,  and  for  that  matter  7,  8  and  9,  all  lie  in  the  same  periodic  domain  as  above 
the  nodes  1,  2  and  3.  In  the  skew  periodic  grid,  each  successively  higher  row  of  nodes  (in 
this  case  NY  =  3  rows)  is  displaced  a  full  system  width  Lx  in  the  X  direction.  Instead  of 
following  node  3  with  the  image  of  node  1,  as  in  the  regular  periodic  grid,  node  3  is  followed 
directly  by  node  4.  Similarly,  node  6  is  followed  in  the  skew  periodic  MLG  by  node  7,  not 
the  image  of  node  4  as  in  the  regular  grid.  By  adding  the  system  length  to  each  succeeding 
row  of  X  positions,  all  nodes  are  automatically  positioned  properly  for  separations  to  be 
calculated  directly  without  concern  for  whether  the  node  in  question  is  on  the  boundary 
of  the  domain.  The  boundaries  of  the  computational  domain  have  therefore  disappeared 
in  the  X  direction. 


The  number  of  extra  storage  locations  needed  to  provide  enough  ghost  nodes  is  much 
smaller  in  the  skew  periodic  grid  since  ghost  nodes  are  only  needed  at  the  “ends”  of  the 
system,  not  at  the  “sides”.  The  “hiccups”  at  the  beginning  of  each  row  in  the  regular 
periodic  representation  have  been  eliminated  in  the  skew  periodic  representation.  As 
shown  in  Fig.  10a,  when  the  index  offsets  for  the  NNT  extend  to  the  third  layer  in 
all  directions,  as  we  must  do  in  some  of  our  molecular  dynamics  calculations,  the  31  cells 
in  the  principle  domain  and  the  3  extra  domains  are  necessary  to  set  up  enough  ghost 
cells  to  allow  vector  operations  which  span  the  entire  2D  cross-section  without  boundary 
interruptions  or  special  boundary  corrections.  In  this  case,  as  shown  in  Fig.  10b,  the  skew 
periodic  grid  again  uses  many  fewer  ghost  nodes.  Three  extra  rows  pius  three  extra  points 
on  each  end  of  the  system  gives  12  ghost  ceils  at  each  end.  The  total  number  of  points  is 
thus  33  rather  than  31,  a  substantial  savings  in  computer  memory. 

If  skew  periodicity  is  used  in  all  three  dimensions,  the  number  of  ghost  ceils  at  each  end 
is  three  planes  (27  points)  pius  three  rows  (9  points)  plus  3  points.  Thus  the  skew  periodic 
3D  grid  has  105  points  altogether  of  which  27  are  active.  The  same  system  represented  in 
a  regular  periodic  grid  needs  a  total  of  729  points  of  which  only  27  are  active.  Of  course 


the  relative  difference  is  smaller  when 


the  active  MLG  is  larger  than  3x3x3  but  the  total 


amount  of  wasted  storage  increases  rapidly  with  the  size  of  the  system. 


6.  NNT  Optimization  and  Vectorization  Based  cn 
the  Near  Miss  Probability 


The  discussion  of  “Near  Misses”  presented  here  concerns  the  node  system  described  in 
53.  The  system  of  312  noninteracting  points  represents  a  worst  case  for  NNT  optimization 
in  significant  applications  of  near  neighbor  algorithms.  For  example,  in  molecular  dynamics 
studies  the  interaction  is  a  force  which  becomes  repuisive  at  small  separations.  It  is  more 
unlikely  that  a  particle  several  NNT  ceils  away  will  be  within  Rc  of  the  “target  particie” 
if  it  is  a  finite-size  particie  rather  than  zero  sized.  There  is  zero  probability  for  more  than 
one  molecule  to  occupy  the  same  position  in  space.  A  system  of  non- inter  acting  points 
demonstrates  two  significant  aspects  of  MLG  indexing:  1)  One  is  able  to  construct  an 
optimal  NNT  in  terms  of  the  average  volume  about  the  target  node  containing  nearest 
neighbors,  i.e.  an  approximate  sphere;  and  2)  That  NNT  indexing  via  offsets  relative  to 
the  target  cell,  permits  the  computation  of  node-node  interactions  using  vector  and  parallel 
processing  methods. 
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The  first  stage  of  NNT  optimization,  i.e.  minimal  scaling  in  terms  of  (3.2),  is  to 
determine  via  statistical  analysis  an  upper  bound  on  the  MLG  index  offsets  of  nearest 
neighbors.  Such  an  analysis  is  shown  in  Fig.  11  for  the  512  node  system  for  an  interaction 
cutoff  radius  Rc  =  4.4.  Because  of  symmetry,  only  a  portion  of  the  NNT  is  considered  for 
each  of  the  logical  planes. 

Figure  11  shows  that  for  nodes  in  the  logical  plane  k  =  3  there  is  essentially  zero 
probability  for  a  “near  miss”.  Further,  for  the  logical  planes  k  =  0,  1  and  2,  the  contours 
of  constant  probability  are  seen  to  be  roughly  circular.  The  next  stage  of  NNT  optimization 
is  to  consider  only  those  logical  nodes  for  which  the  indexing  of  “close”  nodes  is  possible. 
For  the  system  desribed  in  Fig.  11,  this  would  suggest  using  an  NNT  having  roughly 
the  shape  of  a  hemisphere.  The  NNT  shape  can  be  selected  by  storing  in  separate  data 
arrays  both  the  logical  and  spatial  offsets  of  near  neighbors  defined  according  to  the  skew 
periodic  indexing  scheme  and  the  desired  NNT  shape.  The  spatial  offsets  are  the  fixed 
separations  of  the  periodic  image  nodes.  This  procedure  is  described  in  Fig.  12a.  In 


this  figure  the  array  containing  the  index  offsets,  IJOFF(IPT,K2),  is  computed  outside 
the  timing  loop.  A  similar  array  is  defined  for  the  spatial  offsets  which  are  used  in  the 
interaction  calculations. 

In  Fig.  12a,  IJN  is  the  maximum  node  index  in  the  skew-periodic  grid  in  logical  plane 
k  and  NPT  is  the  maximum  number  of  nodes  indexed  by  the  NNT  in  logical  plane  K2.  The 
procedure  in  r  ig.  12a,  however,  is  not  optimum  for  vector  oriented  computers  and  does  not 
utilize  the  vector  compatibility  of  the  skew-periodic  indexing  scheme.  Vector  processors 
are  designed  such  that  the  inner  DO-loops  of  a  computing  procedure  are  “vectorized”.  The 
procedure  shown  in  Fig.  12b  is  equivalent  to  that  shown  in  Fig.  12a  but  is  structured  to 
take  full  advantage  of  the  vector  attributes  of  skew  periodicity. 

7.  Analysis  of  Swapping  (Random  Motion) 

There  are  two  important  measures  of  “swapping”  for  a  given  node  system.  These 
are:  (1)  The  total  number  of  “swaps”  per  timestep,  and  (2)  the  convergence  rate  for  the 
“swapping”,  i.e.  the  average  number  of  “swapping”  iterations  required  to  reorganize  the 
MLG  each  timestep.  These  features  depend  on  the  extent  to  which  the  MLG  indices  are 
perturbed  from  monotonicity  each  timestep.  The  amount  of  work  to  restore  mcnotonicity 
by  “swapping”  is  therefore  a  function  of  (1)  the  timestep  and  (2)  the  number  of  nodes 
in  the  system.  An  increase  in  timestep  results  in  a  larger  perturbation  of  the  monotone 
indexing.  An  increase  in  system  size  increases  the  upper  bound  on  the  amount  of  total 
reordering  required  to  restore  monotonicity. 

The  statistical  resuits  presented  here  for  swapping  are  for  node  systems  consisting  of 
non-interacting  points.  A  qualitative  analysis  of  the  dependence  of  swapping  on  “inter¬ 


action  strength”  was  undertaken  by  introducing  central  forces  between  the  system  nodes. 


The  swapping  required  was  found  to  be  less  than  that  for  non-interacting  nodes  because 
now  nodes  often  rebound  without  passing.  Thus,  for  applications  such  as  molecular  dy¬ 
namics,  a  system  of  non-interacting  nodes  represents  a  worst  case. 

A  quantitative  description  of  the  convergence  rate  of  swapping  iterations  for  a  system 
of  non-interacting  nodes  is  given  in  Figs.  13  and  14.  Shown  in  Fig.  13  are  distributions 
of  swapping  iterations  for  three  different  timesteps  which  differ  by  a  factor  of  4.  As 
expected,  the  relatively  larger  timesteps  require  more  “swaps”  to  restore  the  MLG  Indices 
to  monotonicity.  A  sixteenfoia  increase  in  timestep,  however,  occasions  only  a  factor  of 


two  increase  in 


.u 


e  number  of  iterations.  Thus,  to  integrate  for  a  given 


:e. 


testeos  are  actually  much  more  efficient. 
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Shown  in  Fig.  14  are  distributions  of  swapping  iterations  for  systems  having  (16  x  16  x 
16)  and  (32  x  32  x  32)  nodes,  respectively.  The  larger  system,  as  expected,  requires  more 
“swaps”  to  maintain  MLG  order  for  a  given  timestep.  However,  for  the  larger  system,  the 
convergence  rate  for  swapping  is  still  comparable  to  that  for  the  smaller  system. 

For  molecular  dynamics  and  other  manybody  problems  the  timesteps  used  in  the 
calculation  of  Figs.  14  and  15  are  unrealistically  large  since  the  question  of  accuracy  for 
real  orbits  has  not  been  addressed.  An  analysis  of  the  “swapping”  iteration  was  undertaken 
for  timesteps  having  sizes  suitable  for  such  applications.  For  these  cases  no  swapping  was 
observed  for  a  significant  fraction  of  the  time  increments.  For  those  timesteps  where  swaps 
were  required,  the  maximum  swapping  iteration  level  was  about  2. 

The  JVlog3(iV)  scaling  of  the  number  of  swapping  iterations  is  demonstrated  in  Fig. 
15  for  three  different  timesteps. 

Appendix 

Derivation  of  (3.2),  an  example  of  the  linear  scaling  of  the  total  number  of  pair  inter¬ 
actions  with  the  size  of  the  system.  The  coefficient  of  this  scaling  is  a  function  of  NNT 
size. 

Let  the  MLG  indexing  nodes  in  the  system  consist  of  Ng  logical  planes  each  consisting 
of  Ng  x  Ng  logical  cells  arranged  in  an  Ng  x  Ng  array.  Next,  let  the  spatial  domain  be 
doubly  periodic  in  X  and  Y  and  bounded  in  Z  by  two  reflecting  end  walls.  And  finally, 
let  -Ve  be  the  upper  bound  on  logical  offsets  included  in  the  NNT.  Given  these  conditions, 
the  number  of  node-node  interactions  computed  each  timestep  can  be  described  using  the 
following  semitriangular  array  of  logical  plane  index,  k,  pairs. 


(1,1) 

(2,2) 

(3,3) 

(4,4)  (5,5)  (6,6) 

(1,2) 

(2,3) 

(3,-1) 

(4,5)  (5,6) 

(1,3) 

(2,4) 

(3,5) 

(4,6)  ...  (Ng  -  2, 

(1,-Ve-l) 


(y,  -  ,v„.v,) 


U.i) 


These  index  pairs  represents  ail  possible  inter-  and  intraplar.e  node-node  interaction 
calculations.  The  number  of  different  inter-  and  intraplane  k  index  pairs  are  (.Vc  x  X3  — 
:v- ~ 1 ]  ana  respectively.  For  each  k  index  pair  corresponding  to  Lnterpiane  calcula¬ 
tions,  i.e.  (kl,k2)  where  kl  =  k2,  one  can  associate  _"(2.YC  —  1)*;  x  [X‘]  interceil  calculations. 
Similarly,  for  each  k  index  pair  (kl,k2)  where  kl  =  k2,  one  can  associate  "2.V?  —  2 Nt]  x  (.V*; 
intercell  calculations.  It  follows  that  the  total  number  of  node-node  interactions  computed 
each  timestep  is 


Contoured  Interactions  =  X,  x  Nn  — 


XC(XC-I): 


:i(2.v«  - 1)-:  x  [x-] 


-  Xg[2N;  -  2.Va;  X  ;.Y;:. 


(A.2) 


)  follows  from  (A.2)  by 


Letting  .V  represent  the  total  number  of  nodes,  i.e.  .V  =  .Y“,  (3.2 
setting  -Vs  =  3. 
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FIG.  3.  Frequencies  for  neighbor  separations  at  short  range  corresponding  to  relatively 
irge  index  onsets,  i.e.  (-3,  1.0),  from  the  targe:  ceil.  These  frequencies  are  the  same  as 
hose  shewn  in  Fig  2  for  the  range  cf  distance  0  to  10  A. 
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FIG.  12a.  Steps  of  interaction  calculation  between  two  logical  planes. 
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